Skip to content

Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735#1553

Open
Abhishek8108 wants to merge 1 commit intoopenai:mainfrom
Abhishek8108:submission/gdn-hybrid-architecture
Open

Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735#1553
Abhishek8108 wants to merge 1 commit intoopenai:mainfrom
Abhishek8108:submission/gdn-hybrid-architecture

Conversation

@Abhishek8108
Copy link
Copy Markdown

Summary

Non-record submission introducing the GDN-Hybrid architecture: a Griffin-style hybrid that replaces the transformer backbone with Gated DeltaNet (delta-rule linear recurrence) + Sliding Window Attention.

Corrected val_bpb: 1.209735 (3-artifact mean, stride=512)

Layout: [GDN×5] → [SWA] → [GDN×5] → [SWA_shared] — 33.86M params, SP1024, int6 GPTQ + zstd-22. No TTT. Fixed predictor.

BPB Correction (from closed PR #1545)

The original submission was closed after a double-counting bug was found in build_sentencepiece_luts: the leading-space byte was included in base_bytes AND added again conditionally in the eval loop, inflating byte_count and producing an artificially low BPB.

Fix: remove +1 from base_bytes to match the canonical train_gpt.py. The training itself was unaffected. The three saved artifacts were rescored with the corrected formula (EVAL_STRIDE=512) to produce the numbers above. Full results in rescore_results.tsv.

Compliance

  • Fixed predictor, no eval-time adaptation
  • TTT_ENABLED=0
  • No SLOT, RLS, or n-gram mixer at eval
  • GPTQ calibration on model-generated sequences only (no val data)
  • All artifacts < 16MB ✓
  • Training: 590s on 8×H100 SXM per seed ✓

…09735

Moves GDN-Hybrid to track_non_record_16mb with corrected BPB calculation.
Fixes double-count bug in build_sentencepiece_luts (leading-space +1 was
counted in base_bytes and again in the eval loop). Corrected 3-artifact
mean: 1.209735 BPB (stride=512 rescore of saved artifacts). Refs PR openai#1545.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant